Efficiently Mining Homomorphic Patterns from Large Data Trees

نویسندگان

  • Xiaoying Wu
  • Dimitri Theodoratos
  • Zhiyong Peng
چکیده

Finding interesting tree patterns hidden in large datasets is an central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data. In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts nonredundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps a technique that not only minimizes the memory consumption but also the CPU time. We conduct detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real datasets outperforming state-of-the-art embedded tree mining algorithms applied to a large data tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preservation of Privacy for Multiparty Computation System with Homomorphic Encryption

Data mining is the task of discovering significant patterns/rules/results from a set of large amount of data stored in databases, data warehouse or in other information repositories. Even though the focus on datamining technology has been on the discovery of general patterns some data-mining applications may require to access individual’s records having sensitive privacy data. Abundance of reco...

متن کامل

Mining XML Frequent Query Patterns

With XML being the standard for data encoding and exchange over Internet, how to find the interesting XML query characteristic efficiently becomes a critical issue. Mining frequent query pattern is a technique to discover the most frequently occurring query pattern trees from a large collection of XML queries. In this paper, we describe an efficient mining algorithm to discover the frequent que...

متن کامل

Frequent Pattern Mining in Attributed Trees

Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we introduce a new data mining method that consists in mining new kind of patterns in a collection of attributed trees (atrees). Attributed trees are tr...

متن کامل

Efficient Tree Mining Using Reverse Search

In this paper, we review our data mining algorithms for discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled trees. These algorithms, namely FREQT for mining frequent ordered trees and UNOT for mining frequent unordered trees, efficiently enumerate all frequent tree patterns without duplicates using reve...

متن کامل

Efficiently Methods for Embedded Frequent Subtree Mining on Biological Data

As a technology based on database, statistics and AI, data mining provides biological research a useful information analyzing tool. The key factors which influence the performance of biological data mining approaches are the large-scale of biological data and the high similarities among patterns mined. In this paper, we present an efficient algorithm named IRTM for mining frequent subtrees embe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016